Search CORE

111 research outputs found

Recommended from our members

The Case for Browser Provenance

Author: Margo Daniel Wyatt
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 06/10/2011
Field of study

In our increasingly networked world, web browsers are important applications. Originally an interface tool for accessing distributed documents, browsers have become ubiquitous, incorporating a significant portion of user interaction. A modern browser now also reads email, plays media, edits documents, and runs applications. Consequently, browsers process large quantities of data, and must record metadata, such as history, to help users manage their data. Most of the metadata that modern browsers record is actually provenance – metadata that captures the causality and lineage of data obtained via the browser. We demonstrate that characterizing browser metadata as provenance and then applying techniques from the provenance research community enables new browser functionality. For example, provenance can improve both history and web search by indicating contextual and personal relationships between data items. Users can also answer complex questions about the origins of their data by querying provenance. Our initial results suggest these features are feasible to implement and could perform well in modern browsers.Engineering and Applied Science

Harvard University - DASH

Performance Introspection of Graph Databases

Author: Macko Peter
Margo Daniel Wyatt
Seltzer Margo I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The explosion of graph data in social and biological networks, recommendation systems, provenance databases, etc. makes graph storage and processing of paramount importance. We present a performance introspection framework for graph databases, PIG, which provides both a toolset and methodology for understanding graph database performance. PIG consists of a hierarchical collection of benchmarks that compose to produce performance models; the models provide a way to illuminate the strengths and weaknesses of a particular implementation. The suite has three layers of benchmarks: primitive operations, composite access patterns, and graph algorithms. While the framework could be used to compare different graph database systems, its primary goal is to help explain the observed performance of a particular system. Such introspection allows one to evaluate the degree to which systems exploit their knowledge of graph access patterns. We present both the PIG methodology and infrastructure and then demonstrate its efficacy by analyzing the popular Neo4j and DEX graph databases.Engineering and Applied Science

CiteSeerX

Crossref

Harvard University - DASH

Recommended from our members

Mining the Web for Medical Hypothesis: A Proof-of-Concept System

Author: Maclean Diana
Seltzer Margo I.
Publication venue
Publication date: 14/05/2012
Field of study

As the prevalence of blogs, discussion forums, and online news services continues to grow, so too does the portion of this Web content that relates to health and medicine. We propose that everyday, medically-oriented Web content is a valuable and viable data source for medical hypothesis generation and testing, despite its being noisy. In this paper, we present a proof-of-concept system supporting this notion. We construct a corpus comprising news articles relating to the drugs Vioxx, Naproxen and Ibuprofen, that were published between 1998-2002. Using this corpus, we show that there was a signiﬁcant link between Vioxx and the concept “Myocardial Infarction” well before the drug was withdrawn from the market in 2004. Indeed, within the Vioxx-related content, the concept ranks amongst the top 3.3% in terms of importance. When compared with the Naproxen and Ibuprofen control literatures, the term occurs signiﬁcantly more frequently in the Vioxx-related content.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

A General-Purpose Provenance Library

Author: Macko Peter
Seltzer Margo I.
Publication venue: USENIX Associaiton
Publication date: 26/11/2012
Field of study

Most provenance capture takes place inside particular tools - a workflow engine, a database, an operating system, or an application. However, most users have an existing toolset - a collection of different tools that work well for their needs and with which they are comfortable. Currently, such users have limited ability to collect provenance without disrupting their work and changing environments, which most users are hesitant to do. Even users who are willing to adopt new tools, may realize limited benefit from provenance in those tools if they do not integrate with their entire environment, which may include multiple languages and frameworks. We present the Core Provenance Library (CPL), a portable, multi-lingual library that application programmers can easily incorporate into a variety of tools to collect and integrate provenance. Although the manual instrumentation adds extra work for application programmers, we show that in most cases, the work is minimal, and the resulting system solves several problems that plague more constrained provenance collection systems.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Provenance Map Orbiter: Interactive Exploration of Large Provenance Graphs

Author: Macko Peter
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 07/10/2011
Field of study

Provenance systems can produce enormous provenance graphs that can be used for a variety of tasks from determining the inputs to a particular process to debugging entire workflow executions or tracking difficult-to-find dependencies. Visualization can be a useful tool to sup- port such tasks, but graphs of such scale (thousands to millions of nodes) are notoriously difficult to visualize. This paper presents the Provenance Map Orbiter, a tool for interactively exploring large provenance graphs using graph summarization and semantic zoom. It presents its users with a high-level abstracted view of the graph and the ability to incrementally drill down to the details.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Hierarchical File Systems Are Dead

Author: Murphy Nicholas
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 21/09/2011
Field of study

For over forty years, we have assumed hierarchical file system namespaces. These namespaces were a rudimentary attempt at simple organization. As users have begun to interact with increasing amounts of data and are increasingly demanding search capability, such a simple hierarchical model has outlasted its usefulness. For this reason, we should design file systems whose organizations map to the ways we access and manipulate data now. We present a new file system architecture in which we replace the hierarchical namespace with a tagged, search-based one.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Multicore OSes: Looking Forward from 1991, er, 2011

Author: Holland David A
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 07/10/2011
Field of study

Upcoming multicore processors, with hundreds of cores or more in a single chip, require a degree of parallel scalability that is not currently available in today’s system software. Based on prior experience in the super-computing sector, the likely trend for multicore processors is away from shared memory and toward shared-nothing architectures based on message passing. In light of this, the lightweight messages and channels programming model, found among other places in Erlang, is likely the best way forward. This paper discusses what adopting this model entails, describes the architecture of an OS based on this model, and outlines a few likely implementation challenges.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure

Author: Guo Philip J.
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 26/11/2012
Field of study

Researchers in fields such as bioinformatics, CS, finance, and applied math have trouble managing the numerous code and data files generated by their computational experiments, comparing the results of trials executed with different parameters, and keeping up-to-date notes on what they learned from past successes and failures. We created a Linux-based system called BURRITO that automates aspects of this tedious experiment organization and notetaking process, thus freeing researchers to focus on more substantive work. BURRITO automatically captures a researcher's computational activities and provides user interfaces to annotate the captured provenance with notes and then make queries such as, "Which script versions and command-line parameters generated the output graph that this note refers to?"Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Provenance as First Class Cloud Data

Author: Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/11/2011
Field of study

Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Parallelization by Simulated Tunneling

Author: Appavoo Jonathan
Seltzer Margo I.
Waterland Amos
Publication venue: USENIX Association
Publication date: 26/11/2012
Field of study

As highly parallel heterogeneous computers become commonplace, automatic parallelization of software is an increasingly critical unsolved problem. Continued progress on this problem will require large quantities of information about the runtime structure of sequential programs to be stored and reasoned about. Manually formalizing all this information through traditional approaches, which rely on semantic analysis at the language or instruction level, has historically proved challenging. We take a lower level approach, eschewing semantic analysis and instead modeling von Neumann computation as a dynamical system, i.e., a state space and an evolution rule, which gives a natural way to use probabilistic inference to automatically learn powerful representations of this information. This model enables a promising new approach to automatic parallelization, in which probability distributions empirically learned over the state space are used to guide speculative solvers. We describe a prototype virtual machine that uses this model of computation to automatically achieve linear speedups for an important class of deterministic, sequential Intel binary programs through statistical machine learning and a speculative, generalized form of memoization.Engineering and Applied Science

Harvard University - DASH